Artificial Intelligence in the Life Sciences
○ Elsevier BV
Preprints posted in the last 30 days, ranked by how well they match Artificial Intelligence in the Life Sciences's content profile, based on 11 papers previously published here. The average preprint has a 0.01% match score for this journal, so anything above that is already an above-average fit.
Liu, T.; Jiang, S.; Zhang, F.; Sun, K.; Head-Gordon, T.; Zhao, H.
Show abstract
Large language models (LLMs) are in the ascendancy for research in drug discovery, offering unprecedented opportunities to reshape drug research by accelerating hypothesis generation, optimizing candidate prioritization, and enabling more scalable and cost-effective drug discovery pipelines. However there is currently a lack of objective assessments of LLM performance to ascertain their advantages and limitations over traditional drug discovery platforms. To tackle this emergent problem, we have developed DrugPlayGround, a framework to evaluate and benchmark LLM performance for generating meaningful text-based descriptions of physiochemical drug characteristics, drug synergism, drug-protein interactions, and the physiological response to perturbations introduced by drug molecules. Moreover, DrugPlayGround is designed to work with domain experts to provide detailed explanations for justifying the predictions of LLMs, thereby testing LLMs for chemical and biological reasoning capabilities to push their greater use at the frontier of drug discovery at all of its stages.
Pinero, S. L.; Li, X.; Lee, S. H.; Liu, L.; Li, J.; Le, T. D.
Show abstract
Long COVID affects millions of people worldwide, yet no disease-modifying treatment has been approved, and existing interventions have shown only modest and inconsistent benefits. A key reason for this limited progress is that current computational drug repurposing pipelines do not match well with the clinical reality of Long COVID. These patients often have persistent, multisystemic symptoms and may already be taking multiple medications, making treatment safety a primary concern. However, most repurposing workflows still treat safety as a downstream filter and rely on disease-associated targets rather than causal drivers. They also assume that the findings of one analysis would generalize across the diverse presentations of Long COVID. We introduce SPLIT, a safety-first repurposing framework that addresses these limitations. SPLIT prioritizes safety at the start of the candidate evaluation, integrates complementary causal inference strategies to identify likely driver genes, and uses a counterfactual substitution design to compare drugs within specific cohort contexts. When applied to cognitive and respiratory Long COVID cohorts, SPLIT revealed three main findings. First, drugs with similar predicted efficacy could have very different predicted safety profiles. Second, the drugs flagged as unfavorable were often different between the two cohorts, showing that drug prioritization is phenotype-specific. Third, SPLIT flagged 18 drugs currently under active investigation in Long COVID trials as having unfavorable predicted profiles. SPLIT provides a practical framework to identify safer, more context-appropriate candidates earlier in the process, supporting more targeted and better-tolerated treatment strategies for Long COVID.
Potter, H. G.
Show abstract
Generative artificial intelligence (genAI) tools are increasingly used by prospective higher education (HE) applicants seeking guidance on university and programme selection. Despite rapidly expanding use, little is known about how genAI systems may introduce or amplify bias in undergraduate admissions decision-making. Here, we systematically examined patterns of bias across three widely used genAI chatbots (ChatGPT, Copilot, Gemini) using neuroscience as a representative UK undergraduate programme. We constructed 216 prompts that varied by applicant characteristics (e.g. gender, study type, academic attainment). Each prompt was submitted to all three chatbots, generating 648 responses and 3240 individual programme recommendations. Output responses underwent text analysis (e.g. n-grams, gender-coded language), and national HE markers of esteem (REF21, TEF23, NSS24) were analysed. Applicant grades and priorities produced the strongest effects on genAI outputs. Higher-grade applicants and those prioritising research received significantly more masculine-coded language, independent of applicant gender. N-gram patterns also diverged: high-grade prompts more frequently elicited terms relating to excellence and research intensity, whereas lower-grade prompts produced greater emphasis on widening access. Recommendations were systematically skewed, with higher grades, private schooling, and research-focused priorities increasing the likelihood of recommending elite institutions and programmes with higher entry requirements. Critically, the gender-coded language of outputs predicted institutional characteristics: masculine-coded responses were associated with recommendations featuring higher entry thresholds and stronger research performance, while feminine-coded responses favoured institutions with higher student satisfaction. These findings reveal clear, systematic biases in how genAI guides prospective HE applicants. Such biases risk reinforcing existing educational and socioeconomic inequalities, underscoring the need for transparency, regulation, and oversight in the use of genAI within HE decision-making. HighlightsO_LIGenAI is widely used by HE applicants despite little study of its biases. C_LIO_LI216 prompts across 3 chatbots generated 3240 programme suggestions. C_LIO_LIGrades and priorities drove major shifts in language and recommendations. C_LIO_LIGender-coded wording mapped onto research strength and entry standards. C_LIO_LIGenAI biases may reinforce inequalities in HE admissions decision-making. C_LI
Ulusoy, E.; Bostanci, S.; Deniz, B. E.; Dogan, T.
Show abstract
MotivationMolecular representation learning is central to computational drug discovery. However, most existing models rely on single-modality inputs, such as molecular sequences or graphs, which capture only limited aspects of molecular behaviour. Yet unifying these modalities with complementary resources such as textual descriptions and biological interaction networks into a coherent multimodal framework remains non-trivial, hindering more informative and biologically grounded representations. ResultsWe introduce SELFormerMM, a multimodal molecular representation learning framework that integrates SELFIES notations with structural graphs, textual descriptions, and knowledge graph- derived biological interaction data. By aligning these heterogeneous views, SELFormerMM effectively captures complementary signals that unimodal approaches often overlook. Our performance evaluation has revealed that SELFormerMM outperforms structure-, sequence-, and knowledge-based models on multiple molecular property prediction tasks. Ablation analyses further indicate that effective cross-modal alignment and modality coverage improve the models ability to exploit complementary information. Overall, integrating SELFIES with structural, textual, and biological context enables richer molecular representations and provides a promising framework for hypothesis-driven drug discovery. AvailabilitySELFormerMM is available as a programmatic tool, together with datasets, pretrained models, and precomputed embeddings at https://github.com/HUBioDataLab/SELFormerMM. Contacttuncadogan@gmail.com
Ahmadov, A.; Ahmadov, O.
Show abstract
Bone morphogenetic protein receptor type IA (BMPR1A) is a key mediator of chondrogenesis and a validated therapeutic target for cartilage repair, yet existing BMP mimetic peptides suffer from low potency and the full-length protein (rhBMP-2) carries significant safety risks. Generative AI tools for protein design can now produce de novo peptide binders, but none have been applied to cartilage regeneration targets. Here, we benchmarked four architecturally distinct AI tools--RFdiffusion, BindCraft, PepMLM, and RFpeptides--to design candidate BMPR1A-binding peptides. We generated 192 candidates alongside 98 negative controls (290 total) and evaluated all complexes using AlphaFold 3 structure prediction, dual physics-based energy scoring (PyRosetta and FoldX), and contact recapitulation against the crystallographic BMP-2:BMPR1A interface (PDB: 1REW). A four-metric composite ranking identified a 15-residue PepMLM design (pepmlm_L15_0026) as the top candidate, combining favorable binding energy (PyRosetta dGseparated = -45.9 REU; FoldX {Delta}G = -19.4 kcal/mol) with the highest contact recapitulation among top-ranked peptides (11/30 gold-standard interface residues). Designed candidates significantly outperformed controls on ipTM (p = 0.002) and FoldX {Delta}G (p < 0.001). BindCraft candidates achieved the highest structural confidence (ipTM up to 0.81) but exhibited moderate contact recapitulation (mean 0.224), consistent with the computational hypothesis that they may engage alternative BMPR1A binding surfaces rather than the native BMP-2 interface. Physicochemical filtering yielded a shortlist of 54 candidates across all four tools. These results establish a reproducible computational framework for AI-guided peptide design targeting cartilage regeneration and identify specific candidates for future experimental validation via binding assays and chondrocyte differentiation studies. Author summaryDamaged cartilage has limited capacity to heal, and current biological therapies based on bone morphogenetic protein 2 (BMP-2) carry serious safety concerns including ectopic bone formation and inflammation. Short peptides that mimic BMP-2s interaction with its receptor BMPR1A could offer a safer, more targeted alternative, but designing such peptides from scratch is challenging. We used four different artificial intelligence tools--each employing a distinct computational strategy--to generate 192 candidate peptides designed to bind BMPR1A. We then evaluated all candidates using multiple independent computational methods to assess binding quality, energy favorability, and whether each peptide targets the correct site on the receptor. Our analysis identified a shortlist of 54 promising candidates, with a 15-residue peptide from the language model-based tool PepMLM emerging as the top-ranked design. We also found evidence that one tool (BindCraft) may produce peptides that bind BMPR1A at sites different from the natural BMP-2 interface, highlighting the importance of validating not just whether a peptide binds, but where it binds. Our computational framework and candidate peptides provide a foundation for future laboratory testing toward cartilage repair therapies.
Joshi, S.; Sowdhamini, R.
Show abstract
MotivationCharacterizing atomic-level stability and cooperative interaction networks is essential for understanding protein function and evolution. However, existing tools often lack the precision to integrate detailed physicochemical energies with higher-order graph-theoretic analyses. ResultsWe present HORI-EN, an updated implementation to the HORI framework, featuring hybrid energetic scoring (Physicochemical + Knowledge-Based) and a Normalized Interaction Score (NIS) based on cumulative distribution functions. HORI-EN identifies higher-order cliques of interacting residues, revealing cooperative stabilization networks. Validation on the SKEMPI v2 dataset demonstrates that HORI-EN shows discriminative performance in identifying mutational hotspots, achieving an ROC-AUC of 0.780 on the full dataset and 0.844 on a clean benchmark. Enrichment analysis indicates a 3.1-fold increase in precision for the top 1% of predictions. Furthermore, analysis of the residue interaction network recovers 77.4% of non-contacting hotspots by identifying one-hop bridging interactions to the partner chain. Beyond hotspot prediction, HORI-EN distinguishes native structures from decoys and captures conserved energetic signatures in evolutionary case studies of serine proteases and lipases. Availability and ImplementationThe web server is freely available at https://caps.ncbs.res.in/HORI-EN and source code is available at https://github.com/thesixeyedknight/HoriPy. Contactmini@ncbs.res.in
Chowdhury, T. D.; Shafoyat, M. U.; Hemel, N. H.; Nizam, D.; Sajib, J. H.; Toha, T. I.; Nyeem, T. A.; Farzana, M.; Haque, S. R.; Hasan, M.; Siddiquee, K. N. e. A.; Mannoor, K.
Show abstract
Alzheimers disease remains a major therapeutic challenge, and no {beta}-secretase (BACE1) inhibitor has achieved clinical approval. A key limitation of prior discovery efforts is reliance on single-parameter optimization, often resulting in candidates with limited translational potential. In this study, we developed a biology-informed computational framework integrating meta-ensemble QSAR modeling, molecular docking, Protein Language Model (ESM-1b)-guided residue interaction weighting, and ADMET profiling within a normalized multi-parameter ranking scheme. Model performance was validated using cross-validation, external validation, and Y-randomization (n = 100; p = 0.009), while applicability domain analysis based on Tanimoto similarity highlighted reduced reliability for extrapolative predictions. Sensitivity analysis showed high ranking stability under moderate perturbations (Spearman {rho} = 0.998 for {+/-}10%; 0.963 for {+/-}25%), with reduced agreement under randomized weighting ({rho} = 0.821), indicating that prioritization is robust but influenced by weight selection. Screening of 16,196 compounds identified 153 predicted actives (accuracy = 0.852; ROC-AUC = 0.920), which were refined to 111 candidates and seven prioritized leads. Molecular dynamics simulations (200 ns) indicated stable binding and persistent catalytic interactions, with Mol-2 showing favorable dynamic stability and ADMET characteristics. Overall, this study presents an interpretable and quantitatively evaluated framework for multi-parameter compound prioritization, supporting more reliable virtual screening in early-stage CNS drug discovery.
Hemedan, A. A.
Show abstract
BackgroundClinical digital twins hold considerable promise for forecasting disease progression, yet the question of when a models outputs should be withheld remains largely unaddressed. A predictive model qualifies as a governed reporting system only when it specifies the operational boundaries under which its outputs are reliable and enforces criteria for suppressing results that fall outside those bounds. MethodsWe present a governed Bayesian digital twin for multi-domain Parkinsons disease (PD) progression, tracking motor function (MDS-UPDRS Part III), cognition (Montreal Cognitive Assessment, MoCA), and autonomic function (SCOPA-AUT). A monotone latent state-space model captures disease progression under four architectural constraints: non-decreasing latent severity, visit-triggered updating, full posterior uncertainty propagation, and non-causal scope. A six-rule confidence gate evaluates each forecast before release; when evidence is insufficient, the gate suppresses the output and returns a structured reason code. We evaluated the framework on the Parkinsons Progression Markers Initiative (PPMI), a multicentre longitudinal observational study (N=4,628 participants; 28,185 visits), using five-fold cross-validation with independent model refits, equity analysis, and coupling-topology sensitivity assessment. The framework is available at https://gitlab.com/ahmed.hemedan/symphony-dt, with a research prototype at https://symphony-dt.com/. ResultsPredictive interval coverage at the 95% level ranged from 94% to 96% across all three endpoints, compared with 64-69% for linear mixed-effects baselines. The confidence gate released governed forecasts at 32.7% of visits under strict three-domain requirements, increasing to 48.1% under a validated partial-observation extension. Suppression was predominantly driven by incomplete clinical assessment (51.5%) rather than model uncertainty (0.2%), and operated equitably across sexes (Cramers V=0.049). Five of six cross-domain coupling parameters were identified from the data (sign probability [≥] 0.99; contraction ratios 0.19-0.35), with all cross-domain forecast correlations matching the directions predicted by the coupling topology. The frameworks own diagnostics localised two observation-model limitations, Prodromal motor heteroscedasticity and medication-burden sensitivity, to a single model layer and specified their resolution. ConclusionsGoverned silence, defined as the rule-based suppression of predictions when reliability conditions are not met, can be embedded in clinical prediction architecture, quantified as a pipeline output, and audited for equity. This work demonstrates the technical executability of governed digital twin architecture at cohort scale and provides a foundation for prospective deployment under routine clinical conditions.
Chen, Y. G.; Chung, W.-Y.; Chang, K. Y.
Show abstract
Accurate protein subcellular localization is essential for biological function, and mislocalization is linked to numerous diseases. While current methods like DeepLoc 2.0 employ lightweight fine-tuning of protein language models (PLMs), their ability to predict multi-compartment localization remains limited. To address this, we introduce DualLoc, a multi-label localization predictor for ten compartments. DualLoc leverages full-parameter fine-tuning of a cascaded dual-transformer architecture, built upon foundational PLMs and augmented with attention and dropout layers. We evaluated this framework using three foundational PLMs--ProtBERT, ESM-2, and ProtT5--as backbones. Cross-validation on Swiss-Prot and independent validation on the Human Protein Atlas demonstrate consistent superiority over state-of-the-art baselines. The best-performing variant, DualLoc-ProtT5, achieves 0.5872 accuracy, 0.8271 micro-F1, and 0.7811 macro-F1, with substantial gains in the Matthews correlation coefficient for the nucleus (+0.13), cell membrane (+0.13), and extracellular space (+0.07). Pointwise mutual information analysis of model outputs reveals biologically relevant compartment couplings, notably between the Golgi apparatus and endoplasmic reticulum (PMI = 0.25, P < 10-6), accurately reflecting secretory pathway coordination. DualLoc provides both a highly accurate predictive tool and a robust framework for investigating protein multi-localization mechanisms. Author summaryWhere a protein resides within a cell determines what it does. When proteins end up in the wrong location, normal cellular function breaks down--a misplacement linked to diseases like cancer and Alzheimers. While computational tools exist to predict these locations, accurately tracking proteins that multitask across multiple cellular compartments simultaneously remains a major challenge. We developed DualLoc, a new approach that predicts protein locations across ten different cellular compartments, from the nucleus to the cell membrane. By training an advanced artificial intelligence model on large protein sequence databases, our method more accurately identifies where proteins go, especially in complex, multi-location scenarios. Importantly, our analysis revealed meaningful biological patterns. We found strong predictive links between compartments that work closely together, such as the Golgi apparatus and the endoplasmic reticulum--two organelles that coordinate protein processing and transport. This suggests our model captures genuine cellular logic rather than simply memorizing data. By improving how we predict protein localization, DualLoc helps researchers better understand normal cellular function and disease mechanisms. Our method is freely available to the biomedical community.
Li, H.; Yu, Y.; Bhandarkar, A.; Kumar, R.; Clark, I. H.; Hu, Y.; Cao, W.; Zhao, N.; LI, F.; Tao, C.
Show abstract
Objective: Behavioral and social factors (BSFs) substantially influence the risk, onset, and progression of Alzheimer disease and related dementias (ADRD). A systematic representation of their interplay is essential for advancing prevention and targeted interventions. However, BSF-related knowledge is scattered across heterogeneous sources, limiting scalable evidence synthesis and computational analysis. To address this, we created a Behavioral Social Data and Knowledge Ontology for ADRD (BSOAD) to represent and integrate BSFs with respect to ADRD. Material and Methods: BSOAD was developed following established ontology design principles, prioritizing reuse of existing ontology elements to ensure semantic interoperability. It was built upon the Social Determinants of Health Ontology (SDoHO) and the Drug-Repurposing Oriented Alzheimer Disease Ontology (DROADO). BSF-related classes were enriched with ICD 10 CM Z55 Z65 codes and ADRD related classes with AD Onto. Relationships between BSFs and ADRD were derived through literature mining. Ontology quality was evaluated through Hootation based expert review and an LLM assisted framework assessing structural coverage and semantic coherence. Results: BSO AD contains 2275 classes, 153 object properties, and 49 data properties. Expert review demonstrated strong rational agreement (0.95), with disagreements resolved through discussion. LLM-based evaluation showed high category coverage rates ([≥] 0.97) and robust semantic alignment with the relevant literature (average completeness = 0.79; conciseness = 0.94). Discussion and Conclusion: BSOAD is, to our knowledge, the first ontology to systematically represent BSFs and hierarchically model their interrelationships in ADRD. It establishes a semantic backbone for computational analysis and knowledge integration. The LLM assisted evaluation framework demonstrates the feasibility of scalable, automated ontology assessment.
Tan, S.; Tian, Z.
Show abstract
The rapid advancement of AI research automation systems--including AI Scientist, data-to-paper, and Agent Laboratory--has demonstrated the potential for autonomous scientific discovery. However, existing benchmarks for evaluating these systems focus predominantly on fundamental sciences (machine learning, physics, chemistry), overlooking the unique challenges of medical clinical research: complex survey designs, inferential statistics with confounding control, adherence to reporting standards (STROBE, CONSORT), and the requirement for clinically actionable interpretation. We present MedResearchBench, the first benchmark specifically designed to evaluate AI systems on medical clinical research tasks. MedResearchBench comprises 16 tasks spanning 7 clinical domains (cardiovascular, oncology, mental health, metabolic, respiratory, neurology, infectious disease), built on publicly available datasets (the National Health and Nutrition Examination Survey [NHANES] and the Surveillance, Epidemiology, and End Results [SEER] program) with ground truth from 16 high-quality published papers (IF range: 2.3-51.0). Each task is evaluated along 6 medical-specific dimensions: statistical methodology, results accuracy, visualization quality, clinical interpretation, confounding sensitivity, and reporting compliance. We describe the benchmark design rationale, task construction methodology, paper selection criteria with anti-paper-mill filtering, and a detailed analysis of task characteristics including methodological diversity, evaluation dimension coverage, and difficulty stratification. To demonstrate benchmark executability, we evaluate an agentic data2paper pipeline on 3 pilot tasks spanning all three difficulty tiers, achieving scores of 72/100 (Tier 1, Cardio_000), 69/100 (Tier 2, Mental_000), and 75/100 (Tier 3, Metabolic_002), with a mean score of 72/100 (B-level). Survey-weighted methodology was correctly implemented across all tasks; primary limitations included covariate incompleteness and reference group misspecification. MedResearchBench addresses a critical gap in AI research evaluation and provides a standardized, community-extensible platform for assessing whether AI systems can conduct clinically sound, publication-quality medical research. All task materials are publicly available at https://github.com/TerryFYL/MedResearchBench.
Wang, Z.; Peng, Y.; Zhou, J.-G.; Bu, X.; Zhao, Y.; Li, Z.; Yan, B.; Sun, Y.; Wang, C.; Shu, C.; Cui, Y.; Wang, S.
Show abstract
Background: The FDA Adverse Event Reporting System (FAERS) is a critical pillar of post-marketing pharmacovigilance; however, its utility is constrained by data heterogeneity, pervasive reporting redundancies, and inconsistent medical terminology. These structural barriers impede reproducible, large-scale analyses and the implementation of precision drug safety surveillance. Methods: We developed faers, an open-source R package that delivers a standardized framework and an end-to-end workflow for transforming raw FAERS data into analysis-ready formats. The package implements a regulatory-compliant multi-level deduplication strategy, automated MedDRA terminology mapping, and an R S4-based object-oriented system to ensure data integrity, traceability, and efficient management of complex relational structures. It further integrates a full suite of disproportionality signal detection methods, including the Reporting Odds Ratio (ROR), Proportional Reporting Ratio (PRR), Bayesian Confidence Propagation Neural Network (BCPNN), and Empirical Bayes Geometric Mean (EBGM). Performance was benchmarked on large-scale FAERS datasets, and validity was confirmed by replicating published findings on anti-PD-1/PD-L1-associated cardiotoxicity and CAR-T cell therapy outcomes, with additional application to immune-related adverse events (irAEs). Findings: The package demonstrated high computational efficiency and near-linear scalability when processing extensive quarterly FAERS data. Validation analyses of two case studies showed excellent concordance with prior literature. Application to an irAE cohort further identified a statistically significant age-by-sex interaction in risk patterns, demonstrating the tool's ability to uncover nuanced demographic signals that are often missed by conventional approaches. Interpretation: The faers package provides a transparent, scalable, and fully reproducible framework for FAERS-based pharmacovigilance. By automating data cleaning, standardization, and advanced signal detection, it lowers technical barriers for researchers and regulators while promoting high-quality, open pharmacoepidemiological research to strengthen drug safety monitoring.
Duarte, S. A.; Mehdiabadi, M.; Bugnon, L. A.; Aspromonte, M. C.; Piovesan, D.; Milone, D. H.; Tosatto, S.; Stegmayer, G.
Show abstract
Intrinsically disordered proteins (IDPs) play an important role in a wide range of biological functions and are linked to several diseases. Due to technical difficulties and the high cost of experimental determination of disorder in proteins, combined with the exponential increase of unannotated protein sequences, the development of computational methods for disorder prediction became an active area of research in the last few decades. In this work, we present emb2dis, a deep learning model that uses protein language models (pLMs) to predict disorder from sequence. The emb2dis tool is a pre-trained model that receives as input a protein sequence, calculates its pLM embedding and passes it to a deep learning model. In contrast to existing approaches, emb2dis integrates informative sequence representations with a novel architecture that combines residual networks (ResNets) and dilated convolutions. This design effectively enlarges the receptive field of the convolution operation, enabling the model to better capture an extended context of each amino acid. At the output, emb2dis assigns a disorder propensity score to each residue in the sequence. The model was evaluated on datasets from the latest CAID3 blind benchmark for disorder prediction, where it achieved first place in the Disorder-PDB category, exhibiting strong performance with high AUC and Fmax scores. Additionally, it ranked among the top ten methods on the Disorder-NOX dataset. We provide a freely available web-demo for emb2dis and a source code repository for local installation. Weblink for the toolhttps://sinc.unl.edu.ar/web-demo/emb2dis/ The importance of the emb2dis tool is that it provides a new deep learning approach and significant improvements in the prediction of protein disorder, with a simple web interface and graphical output detailing per-residue disorder.
Down, T.; Warowny, M.; Walker, A.; DAscenzo, L.; Lee, D.; Zhou, Z.; Cao, S.; Bainbridge, T. W.; Nicoludis, J. M.; Harris, S. F.; Mukhyala, K.
Show abstract
As computational tools and machine learning models for protein sciences continue to advance and proliferate, bench scientists face increasing technical challenges adopting these tools for specific applications such as drug discovery. Here we present GYDE (Guide Your Design and Engineering), an open-source, versatile, and web-based collaboration platform designed to make computational analyses of proteins and antibodies easily accessible to bench scientists. GYDE enables the exploration of sequence-structure-function relationships through a tightly integrated visual interface, offering researchers a comprehensive exploration of protein functional determinants either via real assay data or computational tools. GYDEs intuitive interface facilitates seamless access to cutting-edge AI models for protein and antibody structure prediction, design, and downstream analyses. The flexible and easy addition of new tools and models is facilitated by the use of the Slivka compute API. The platform supports saved sessions that enable researchers to easily share their findings with other users, fostering a more collaborative scientific community. GYDE is freely available for protein scientists in academia and industry to build drug discovery analytics platforms customized to their needs.
Zhang, L.; Wang, L.; Sun, X.; Tang, W.; Su, H.; Qian, Y.; Yang, Q.; Li, Q.; Tang, Z.; Sun, H.; Han, Y.; Jiang, Y.; Lou, W.; Zhou, B.; Wang, X.; Bai, L.; Xie, Z.
Show abstract
Computational drug discovery, particularly the complex workflows of drug molecule screening and optimization, requires orchestrating dozens of specialized tools in multi-step workflows, yet current AI agents struggle to maintain robust performance and consistently underperform in these high-complexity scenarios. Here we present MolClaw, an autonomous agent that leads drug molecule evaluation, screening, and optimization. It unifies over 30 specialized domain resources through a three-tier hierarchical skill architecture (70 skills in total) that facilitates agent long-term interaction at runtime: tool-level skills standardize atomic operations, workflow-level skills compose them into validated pipelines with quality check and reflection, and a discipline-level skill supplies scientific principles governing planning and verification across all scenarios in the field. Additionally, we introduce MolBench, a benchmark comprising molecular screening, optimization, and end-to-end discovery challenges spanning 8 to 50+ sequential tool calls. MolClaw achieves state-of-the-art performance across all metrics, and ablation studies confirm that gains concentrate on tasks that demand structured workflows while vanishing on those solvable with ad hoc scripting, establishing workflow orchestration competence as the primary capability bottleneck for AI-driven drug discovery.
Raspin, K.; Bartlett, L.; Makin, J.; Wilson, R.; Butorac, K.; Roydhouse, J.; Dickinson, J. L.
Show abstract
BACKGROUND: Prostate cancer (PrCa) is the most commonly diagnosed cancer in men in many countries and is the most heritable of the common cancers. Precision medicine approaches to disease management are not routinely available to most men, yet we know that germline genetic testing can help identify those at high-risk of developing advanced or lethal disease and can influence selection of therapeutics. An integral part of healthcare delivery design is the inclusion of patients/consumers in the development of frameworks for managing health interventions that are tailored to meet their needs. METHODS: In Phase I, we undertook focus group discussions with men previously diagnosed with PrCa (n=20), to determine their opinions, perceptions and expectations of germline genetic testing for PrCa. Focus groups were tape-recorded, transcribed verbatim, coded and then thematically analysed using NVivo. In Phase II, themes were then used to design and development a Precision Medicine in Prostate Cancer Information Toolkit, which was reviewed by patients (n=14), their carers/family members (n=4) and healthcare providers (n=14). RESULTS: In Phase I, knowledge about precision medicine and genetic testing was generally low. The strongest motivation for undertaking testing was to identify family members' risk levels (n=7), and the biggest concern pertained to insurance discrimination (n=5). Phase II data revealed that generally healthcare providers (n=8) found the purpose of the toolkit to be clearer than patients (n=5). Though, patients found the task of imagining the usefulness of the toolkit at the time of diagnosis or beforehand when assessing genetic risk, quite difficult. Participants highlighted that information regarding life insurance, implications for their family and costs associated with testing were of concern. CONCLUSIONS: This study has revealed critical knowledge gaps, preferred communication/support needs, and concerns/risks associated with germline genetic testing in PrCa. Concerns pertaining to family members and insurance discrimination are obvious topics that need to be addressed. Our toolkit may be helpful in addressing knowledge gaps, but further testing is needed to ensure its accessibility across literary and cultural contexts.
Broster, J. H.; Popovic, B.; Kondinskaia, D.; Deane, C. M.; Imrie, F.
Show abstract
Molecular docking aims to predict the binding conformation of a small molecule to its protein target. Recent work has proposed diffusion models for this task, from rigid-body docking that diffuses over ligand degrees of freedom to co-folding approaches that jointly generate protein structure and ligand pose. However, diffusion-based docking models have been shown to frequently produce physically implausible poses and fail to consistently recover key protein-ligand interactions. To address this, we introduce a reinforcement learning framework for training diffusion-based docking models directly on non-differentiable objectives. Fine-tuning DiffDock-Pocket for physical validity with our approach substantially increases the number of generated poses that are physically valid and interaction-preserving, with no increase in inference-time compute. Importantly, this comes without sacrificing structural accuracy; in fact, our approach increases the proportion of structures with near-native poses. These effects are most pronounced for protein targets that are dissimilar to the training data. Our fine-tuned DiffDock-Pocket model outperforms both classical docking algorithms and machine learning-based approaches on the PoseBusters set. Our results demonstrate that reinforcement learning can teach diffusion-based docking models to better respect physical constraints and recover key interactions, without the requirement to rely on inference-time corrections.
Poelmans, R.; Bruncsics, B.; Arany, A.; Van Eynde, W.; Shemy, A.; Moreau, Y.; Voet, A. R.
Show abstract
Knowledge-based potentials (KBPs) have long been used to score protein-ligand interactions, yet existing formulations remain isotropic, capturing only distance dependencies and neglecting the directional preferences that govern molecular recognition. Here, we introduce Direction-Enhanced Scoring POTentials (DESPOT), an anisotropic knowledge-based framework that unifies pose scoring and binding-site characterisation within a single probabilistic model. The new probabilistic formulation used in DESPOT naturally supports directional modelling through atom type-specific local reference frames and symmetry-aware geometric discretisation. It also supports steric exclusion, encoded as a dedicated void state that explicitly captures the probability that a spatial bin remains unoccupied. The anisotropic interaction profiles learned by DESPOT reveal systematic directional preferences for interactions such as hydrogen bonds, aromatic interactions, and halogen bonds, that extend beyond idealised geometric models. Evaluation on the CASF-2016 benchmark shows that DESPOT sub-stantially outperforms isotropic KBPs in all pose-discrimination and virtual screening tasks (p << 0.0001 for all enrichment factors), with the largest gains arising from its ability to penalise geometrically implausible poses. Constrained energy minimisation of training structures proves strongly beneficial for the derivation of KBPs, while our train-test leakage analysis reveals that overfitting is an underestimated and understudied issue for KBPs. DESPOT provides a data-driven framework for direction-aware modelling of protein-ligand interactions, with applications in pose scoring, binding-site characterisation, and structure-based design.
Whittle, E. F.; Montgomery, K.-A.; Camps, C.; Elkhateeb, N.; Ryan, C.; Aguti, S.; de Guimaraes, T. A. C.; Kini, U.; Stewart, H.; Douglas, A. G. L.; Wilson, L.; Leitch, H. G.; Lynch, D. S.; Robinson, R.; Michaelides, M.; Yu, T. W.; Gissen, P.; Lauffer, M. C.; Lench, N.; O'Connor, D.; Tavares, A. L.; Sanders, S. J.; Kurian, M. A.; Titheradge, H.; Clement, E.; van der Spuy, J.; Taylor, J. C.; Rinaldi, C.; Muntoni, F.; Zhou, H.; Davidson, A. E.; Ryten, M.; UPNAT consortium,
Show abstract
BackgroundNucleic acid therapies (NATs) comprise engineered DNA- or RNA-based medicines that act through sequence-specific interactions to modify gene function. Among these, antisense oligonucleotide (ASO) therapies are designed to bind messenger RNA (mRNA) or pre-mRNA to alter splicing, transcript stability, or translation. Many patients with a rare genetic disease stand to benefit from these treatments and, as underlying technologies continue to advance, a critical barrier to care is the equitable selection of targets and patients. Owing to landmark progress in genomic health care, the UK is uniquely positioned to develop a national framework on NAT patient-selection infrastructure. The UK Platform for Nucleic Acid Therapies (UPNAT) has been launched, in part, to meet this goal, with a key output being a structured patient and target selection framework to support NAT development and clinical application, using ASO therapies as a pilot modality. Methodology and ResultsA multidisciplinary panel of UK-based experts established the UPNAT framework to enable systematic assessment of ASO amenability across modular domains encompassing disease understanding, functional models, variant characteristics, and the individual patient, incorporating the recently published N1C VARIANT guidelines. This modular structure supports consistent prioritisation of tractable targets while identifying biological, clinical, technical, or evidentiary gaps currently limiting ASO development. Designed for implementation within the UK healthcare infrastructure and amenable to future automation using open-access resources, the framework was iteratively refined through application to genomic and clinical data from approved ASO therapies and selected real-world patient case studies. ConclusionWe present the first disease-agnostic framework to support structured prioritisation of patients and targets (diseases, genes, or variants) for ASO development and consideration within specialist healthcare services. Designed to accommodate rapid technological advances in NATs, the framework promotes transparent, equitable, and reproducible decision-making within the UK National Health Service (NHS), with principles transferable to other healthcare systems.
Poelmans, R.; Van Eynde, W.; Bruncsics, B.; Bruncsics, B.; Arany, A.; Moreau, Y.; Voet, A. R.
Show abstract
AbstractThe development of machine learning models for protein-ligand interactions is fundamentally constrained by the quality and diversity of available structural data. Existing databases of protein-ligand complexes present researchers with an unsatisfying trade-off: carefully curated collections such as PDBBind and HiQBind offer high structural reliability but cover only a narrow slice of the Protein Data Bank (PDB), while large-scale resources like PLInder provide broad coverage at the expense of rigorous quality control. Here, we introduce CROWN (Curated Repository Of Well-resolved Non-covalent interactions), a machine learning-ready dataset that reconciles this tension by applying a comprehensive, fully automated preprocessing pipeline to the PLInder database. Starting from 649,915 protein-ligand interaction systems, CROWN applies a series of interleaved quality filters and processing stages addressing crystallographic resolution, ligand identity, pocket completeness, structural repair, interaction quality, and protonation at physiological pH. A distinguishing feature of the pipeline is a final constrained energy minimisation step using custom flat-bottomed restraints, which balances crystallographic evidence with relaxation of intramolecular strain. This step -- absent from existing protein-ligand datasets -- produces structurally uniform complexes by reconciling the heterogeneous refinement practices of different crystallographers and structure determination protocols, without distorting the experimentally observed binding geometry. The resulting dataset of 153,005 complexes represents a roughly four-fold increase in protein and species diversity over PDBBind and HiQBind, while maintaining rigorous structural standards. Importantly, CROWN adopts a geometry-centric design philosophy that treats the 3D arrangement of atoms at the binding interface as a self-consistent source of information, rather than relying on externally measured binding affinities that cover only a fraction of known structures and introduce well-documented biases. We anticipate that CROWN will serve as a broadly useful resource for training generative models of protein-ligand binding poses, developing scoring functions, and benchmarking interaction prediction methods.